Goto

Collaborating Authors

 state 1




A Proofs and Derivation A.1 Proof for Theorem

Neural Information Processing Systems

Let's follow the notations in Alg. 3 of Argmax Flow. We can unfold the determinant by the i-th row. This is illustrated in Figure A.1, where the adaptive Further details can be found in Tables A.2. Furthermore, we will make the code used to reproduce these results publicly available. In different environments, different state encoders were exploited. We used MLP encoder for discrete control tasks and CNN encoder for Pistonball task.


Toggling stiffness via multistability

Oliveira, Hugo de Souza, Curatolo, Michele, Sachse, Renate, Milana, Edoardo

arXiv.org Artificial Intelligence

Mechanical metamaterials enable unconventional and programmable mechanical responses through structural design rather than material composition. In this work, we introduce a multistable mechanical metamaterial that exhibits a toggleable stiffness effect, where the effective shear stiffness switches discretely between stable configurations. The mechanical analysis of surrogate beam models of the unit cell reveal that this behavior originates from the rotation transmitted by the support beams to the curved beam, which governs the balance between bending and axial deformation. The stiffness ratio between the two states of the unit cell can be tuned by varying the slenderness of the support beams or by incorporating localized hinges that modulate rotational transfer. Experiments on 3D-printed prototypes validate the numerical predictions, confirming consistent stiffness toggling across different geometries. Finally, we demonstrate a monolithic soft clutch that leverages this effect to achieve programmable, stepwise stiffness modulation. This work establishes a design strategy for toggleable stiffness using multistable metamaterials, paving the way for adaptive, lightweight, and autonomous systems in soft robotics and smart structures.


A Proofs and Derivation

Neural Information Processing Systems

Let's follow the notations in Alg. 3 of Argmax Flow. We can unfold the determinant by the i-th row. This is illustrated in Figure A.1, where the adaptive Further details can be found in Tables A.2. Furthermore, we will make the code used to reproduce these results publicly available. In different environments, different state encoders were exploited. We used MLP encoder for discrete control tasks and CNN encoder for Pistonball task.


Deliberate Planning in Language Models with Symbolic Representation

Xiong, Siheng, Liu, Zhangding, Zhou, Jieyu, Su, Yusen

arXiv.org Artificial Intelligence

Planning remains a core challenge for large language models (LLMs), particularly in domains that require coherent multi-step action sequences grounded in external constraints. We introduce SymPlanner, a novel framework that equips LLMs with structured planning capabilities by interfacing them with a symbolic environment that serves as an explicit world model. Rather than relying purely on natural language reasoning, SymPlanner grounds the planning process in a symbolic state space, where a policy model proposes actions and a symbolic environment deterministically executes and verifies their effects. To enhance exploration and improve robustness, we introduce Iterative Correction (IC), which refines previously proposed actions by leveraging feedback from the symbolic environment to eliminate invalid decisions and guide the model toward valid alternatives. Additionally, Contrastive Ranking (CR) enables fine-grained comparison of candidate plans by evaluating them jointly. Conceptually, SymPlanner operationalizes two cognitive faculties: (i) error monitoring and repair via externalized feedback (IC) and (ii) preference formation among alternatives via pairwise comparison (CR), advancing cognitively plausible, symbol-grounded planning aligned with the rich structure in intelligent systems. We evaluate SymPlanner on PlanBench, demonstrating that it produces more coherent, diverse, and verifiable plans than pure natural language baselines.




Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL

Zurek, Matthew, Zamir, Guy, Chen, Yudong

arXiv.org Machine Learning

We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage, and has been relatively underexamined from a theoretical perspective. While previous work obtains performance guarantees under single-policy data coverage assumptions, such guarantees utilize additional complexity measures which are uniform over all policies, such as the uniform mixing time. We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL. We are also the first to handle general weakly communicating MDPs, contrasting restrictive structural assumptions made in prior work. To achieve this, we introduce an algorithm based on pessimistic discounted value iteration enhanced by a novel quantile clipping technique, which enables the use of a sharper empirical-span-based penalty function. Our algorithm also does not require any prior parameter knowledge for its implementation. Remarkably, we show via hard examples that learning under our conditions requires coverage assumptions beyond the stationary distribution of the target policy, distinguishing single-policy complexity measures from previously examined cases. We also develop lower bounds nearly matching our main result.


Advanced posterior analyses of hidden Markov models: finite Markov chain imbedding and hybrid decoding

Bæk, Zenia Elise Damgaard, Macià, Moisès Coll, Skov, Laurits, Hobolth, Asger

arXiv.org Machine Learning

Two major tasks in applications of hidden Markov models are to (i) com pute distributions of summary statistics of the hidden state sequence, and (ii) decode the hidden state sequence. We describe finite Markov chain imbedding (FMCI) and hybrid decoding to solve each of t hese two tasks. In the first part of our paper we use FMCI to compute posterior distributions o f summary statistics such as the number of visits to a hidden state, the total time spent in a hidden st ate, the dwell time in a hidden state, and the longest run length. We use simulations from the hidde n state sequence, conditional on the observed sequence, to establish the FMCI framework. In the second part of our paper we apply hybrid segmentation for improved decoding of a HMM. We demonstra te that hybrid decoding shows increased performance compared to Viterbi or Posterior decodin g (often also referred to as global or local decoding), and we introduce a novel procedure for choosing the tuning parameter in the hybrid procedure. Furthermore, we provide an alternative derivation of the hybrid loss function based on weighted geometric means. We demonstrate and apply FMCI and hyb rid decoding on various classical data sets, and supply accompanying code for reproducibility. Key words: Artemis analysis, decoding, finite Markov chain imbedding, hidden Mar kov model, hybrid decoding, pattern distributions.